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1 . A method for searching a collection of items, wherein each item in the 
collection has a set of properties, comprising the steps of: 

obtaining a query composed of a first set of one or more properties; and 
obtaining a result based on applying a distance function to one or more of the items in 
the collectioi\wherein 

the distance^fiinction determines a distance between the query and an item in the 
collection based o\ the number of items in the collection that are associated with all of 
the properties in the\ptersection of the first set of properties and the set of properties for 
the item. 
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10 2. The methoV of claim 1, further including the step of associating each item 

in the collection with a set of properties. 

3. The method of dlaim 1, wherein the step of obtaining a result includes 
identifying result items whose distance from the query is within a first threshold. 

4. The method of clainrB, wherein the step of obtaining a result includes 
15 ranking the result items according to tfteir distance from the query. 

5. The method of claim 3, wherein the threshold is defined as a number of 
result items. 

6. The method of claim 3, wherein the threshold is defined as a distance. 



7. The method of claim 1 , further including the step of returning the result. 

20 8. The method of claim 1 , wherein the step \f obtaining a query includes the 

step of mapping a received query to a set of one or more properties. 

9. The method of claim 1, wherein one or more of the properties are binary. 

10. The method of claim 1, wherein one or more of tnt properties are related 
by a partial order, and wherein, if an item is associated with a property, then the item is 

25 also associated with all ancestors of that property in the partial order.* 
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1. The method of claim 6, wherein one or more of the properties represent 
numerical values or ranges, and wherein the partial order reflects a set of containment 
relationship among the numerical values or ranges. 

12. \ The method of claim 1, wherein the properties are grouped into 
5 equivalence classes. 

13. TheViethod of claim 12, further including the step of grouping the 
properties into equivalence classes using clustering. 

14. The method of claim 13, wherein each property has a set of subproperties, 
wherein the clustering is performed such that the distance between two properties in the 

p n 10 collection is correlated to thetaumber of properties in the collection that are associated 
Q with all of the subproperties common to both properties. 

pj 

M, 15. The method of claii^ 1, wherein the query corresponds to a single item in 

the collection. 

M 16. The method of claim 1, ^herein the query corresponds to a plurality of 

fU 

n\ 15 items in the collection. 



17. The method of claim 1, wherein the query is independent of the items in 
the collection. 

18. The method of claim 1, wherein thkstep of obtaining a result is 
constrained to a subcollection of the items in the collection. 

20 19. The method of claim 18, wherein the suft^ollection is specified as an 

expression of properties. 

20. The method of claim 19, wherein the expression includes a subset of the 
set of properties that compose the query. 

21 . The method of claim 1, wherein the step of obtaining a query includes 
25 identifying certain properties to be ignored in the step of obtaining ^result. 
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^2. The method of claim 1, wherein the distance function is applied explicitly. 

23. \ The method of claim 1, wherein the distance function is applied implicitly. 

24. The method of claim 23, wherein the step of obtaining a result includes the 
step of iterating a random walk process to select potential result items. 

25. The method of claim 24, wherein the step of obtaining a result includes 
ranking the potential result items by frequency and selecting the potential result items 
having higher frequenciesA 

26. The method of olaim 23, wherein the step of obtaining a result includes 
iterating through one or more sutfeets of the query and identifying items associated with 
the one or more subsets. \ 

27. The method of claim 26Vvherein the one or more subsets are prioritized 
according to the number of items in the collection that have all of the properties in each 
subset and wherein iterating through one orViore subsets of the query is continued until a 
first threshold is reached. \ 

28. The method of claim 1, wherein the^tep of obtaining a result includes 
applying a Euclidean distance function. \ 

29. The method of claim 28, wherein the step\of obtaining a result includes 
merging a first result determined by applying the distance function and a second result 
determined by applying the Euclidean distance function. \ 

30. The method of claim 28, wherein the step of obtarmng a result includes 
determining a first result by applying either the distance function orthe Euclidean 
distance function and applying the other distance function to the first rfesult. 

31 . A method for analyzing two sets of properties from a plurahty of sets of 
properties, comprising the steps of: \ 
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determining a set of common properties in the intersection of the two sets of 
properties 

determining the number of sets of properties from the plurality of sets of 
properties that mclude the set of common properties; and 

5 assessing tire distance between the two sets of properties as a function of the 

number of sets of properties that include the set of common properties. 

32. A methocMor analyzing the relationship between two items in a collection 
of items, wherein each itei\in the collection is associated with a set of properties, 
comprising the steps of: 



20 



25 



obtaining a set of properfies with which the two items are commonly associated; 



and 



determining the degree of conknonality between the two items as a function of the 
number of items in the collection that are associated with all of the properties with which 
the two items are commonly associated. 

33. A computer program productAesiding on a computer readable medium, 
for use in searching a collection of items, the computer program product comprising 
instructions for causing a computer to: 

receive a query composed of one or more propekies; and 
obtain a result based on applying a distance funct^n to one or more items in the 
collection, wherein 

the distance function determines a distance between the query and an item in the 
collection based on the number of items in the collection that are associated with all of 
the properties in the intersection of the first set of properties a^d the set of properties for 
the item. 

34. The computer program product of claim 33, whereiX the instructions cause 
the computer to obtain a result by identifying exactly the items whos\^ distance from the 
query is within a threshold. 
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i*35. The computer program product of claim 33, wherein the instructions cause 
the computer to obtain a result by identifying approximately the items whose distance 
from the V er y is within a threshold according to a heuristic. 

36. \The computer program product of claim 35, wherein the heuristic permits 
5 a trade-off between the accuracy and the performance of a search. 

37. The \omputer program product of claim 35, wherein the heuristic 
includes the use of a random walk process. 

38. A computer system for managing data records comprising: 

H* an information retrieval subsystem that stores and retrieves data records, each data 

Q \ 

ri 10 record being associated with a\et of properties; and 



a similarity search subsystem that receives similarity search queries and processes 
similarity search queries based on aVistance function, a similarity search query being 
associated with a first set of propertieV wherein 

:l \ 

\ 

M the distance function determines a distance between the query and a data record in 

5 : \ 

•a? \ 

3 15 the collection based on the number of dataVecords in the collection that are associated 
* with all of the properties in the intersection of the first set of properties and the set of 

properties for the data record. 

39. The computer system of claim 38,Vurther including a clustering subsystem 
that employs the distance function of the similarity\earch subsystem to construct a graph. 

20 40. A method for applying a matching algorithm to a collection of items, each 

item being associated with a set of properties, comprising the steps of: 

constructing a graph having nodes that correspond to items, and having edges that 
correspond to pairs of items, wherein each edge has a cost correlated to the number of 
items in the collection that are associated with all of the properties in the intersection of 

25 the sets of properties for the two items that the edge links; and 
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identifying a subset of the edges that constitutes a minimum-cost matching with 
respect toUhe graph. 




41 . A method for applying a clustering algorithm to a collection of items, each 
item being associated with a set of properties, comprising the steps of: 



correspond to pairs of items, wherein each edge has a cost correlated to the number of 
items in the collection that are associated with all of the properties in the intersection of 
the sets of properties for the two items that the edge links; and 

identifying a collection of subsets of the edges that constitutes a minimum-cost 
10 clustering with respect to the graph. 



5 



constructing a graph having nodes that correspond to items, and having edges that 
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